智能论文笔记

Function Approximation for High-Energy Physics: Comparing Machine Learning and Interpolation Methods

Ibrahim Chahrour , James D. Wells

分类：机器学习

2021-11-29

需要近似函数在科学中普遍存在，是由于经验限制或访问功能的高计算成本。在高能量物理学中，过程的散射横截面的精确计算需要评估计算密集的积分。机器学习中的各种方法已被用于解决这个问题，但缺乏使用一种方法的动机缺乏。比较这些方法通常高度依赖于手头的问题，因此我们指定了我们可以评估函数大量次数的情况，之后可以进行快速准确的评估。我们考虑四个插值和三种机器学习技术，并在三个玩具功能上比较他们的表现，四点标量Passarino-Veltman $ D_0 $函数，以及双环自我能量大师积分$ M $。我们发现，在低维度（$ d = 3 $）中，传统的插值技术，如径向基函数表现得非常好，但在较高尺寸（$ d = 5,6,9 $）中，我们发现多层的感觉（AKA神经网络）不会从维度的诅咒那么多遭受，并提供最快，最准确的预测。

translated by 谷歌翻译

NusaCrowd: Open Source Initiative for Indonesian NLP Resources

Samuel Cahyawijaya , Holy Lovenia , Alham Fikri Aji , Genta Indra Winata , Bryan Wilie , Rahmad Mahendra , Christian Wibisono , Ade Romadhony , Karissa Vincentio , Fajri Koto

分类：自然语言处理 | 人工智能

2022-12-19

We present NusaCrowd, a collaborative initiative to collect and unite existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have has brought together 137 datasets and 117 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their effectiveness has been demonstrated in multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and its local languages. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and its local languages. Our work is intended to help advance natural language processing research in under-represented languages.

translated by 谷歌翻译

Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue Systems

Sarah E. Finch , James D. Finch , Jinho D. Choi

分类：自然语言处理

2022-12-18

There has been great recent advancement in human-computer chat. However, proper evaluation currently requires human judgements that produce notoriously high-variance metrics due to their inherent subjectivity. Furthermore, there is little standardization in the methods and labels used for evaluation, with an overall lack of work to compare and assess the validity of various evaluation approaches. As a consequence, existing evaluation results likely leave an incomplete picture of the strengths and weaknesses of open-domain chatbots. We aim towards a dimensional evaluation of human-computer chat that can reliably measure several distinct aspects of chat quality. To this end, we present our novel human evaluation method that quantifies the rate of several quality-related chatbot behaviors. Our results demonstrate our method to be more suitable for dimensional chat evaluation than alternative likert-style or comparative methods. We then use our validated method and existing methods to evaluate four open-domain chat models from the recent literature.

translated by 谷歌翻译

A Generalized EigenGame with Extensions to Multiview Representation Learning

James Chapman , Ana Lawry Aguila , Lennie Wells

分类：机器学习 | (统计)机器学习

2022-11-21

Generalized Eigenvalue Problems (GEPs) encompass a range of interesting dimensionality reduction methods. Development of efficient stochastic approaches to these problems would allow them to scale to larger datasets. Canonical Correlation Analysis (CCA) is one example of a GEP for dimensionality reduction which has found extensive use in problems with two or more views of the data. Deep learning extensions of CCA require large mini-batch sizes, and therefore large memory consumption, in the stochastic setting to achieve good performance and this has limited its application in practice. Inspired by the Generalized Hebbian Algorithm, we develop an approach to solving stochastic GEPs in which all constraints are softly enforced by Lagrange multipliers. Then by considering the integral of this Lagrangian function, its pseudo-utility, and inspired by recent formulations of Principal Components Analysis and GEPs as games with differentiable utilities, we develop a game-theory inspired approach to solving GEPs. We show that our approaches share much of the theoretical grounding of the previous Hebbian and game theoretic approaches for the linear case but our method permits extension to general function approximators like neural networks for certain GEPs for dimensionality reduction including CCA which means our method can be used for deep multiview representation learning. We demonstrate the effectiveness of our method for solving GEPs in the stochastic setting using canonical multiview datasets and demonstrate state-of-the-art performance for optimizing Deep CCA.

translated by 谷歌翻译

Deep Learning Generates Synthetic Cancer Histology for Explainability and Education

James M. Dolezal , Rachelle Wolk , Hanna M. Hieromnimon , Frederick M. Howard , Andrew Srisuwananukorn , Dmitry Karpeyev , Siddhi Ramesh , Sara Kochanny , Jung Woo Kwon , Meghana Agni

分类：计算机视觉

2022-11-12

Artificial intelligence methods including deep neural networks (DNN) can provide rapid molecular classification of tumors from routine histology with accuracy that matches or exceeds human pathologists. Discerning how neural networks make their predictions remains a significant challenge, but explainability tools help provide insights into what models have learned when corresponding histologic features are poorly defined. Here, we present a method for improving explainability of DNN models using synthetic histology generated by a conditional generative adversarial network (cGAN). We show that cGANs generate high-quality synthetic histology images that can be leveraged for explaining DNN models trained to classify molecularly-subtyped tumors, exposing histologic features associated with molecular state. Fine-tuning synthetic histology through class and layer blending illustrates nuanced morphologic differences between tumor subtypes. Finally, we demonstrate the use of synthetic histology for augmenting pathologist-in-training education, showing that these intuitive visualizations can reinforce and improve understanding of histologic manifestations of tumor biology.

translated by 谷歌翻译

Disclosure of a Neuromorphic Starter Kit

James S. Plank , Bryson Gullett , Adam Z. Foshie , Garrett S. Rose , Catherine D. Schuman

分类：神经与进化计算

2022-11-08

This paper presents a Neuromorphic Starter Kit, which has been designed to help a variety of research groups perform research, exploration and real-world demonstrations of brain-based, neuromorphic processors and hardware environments. A prototype kit has been built and tested. We explain the motivation behind the kit, its design and composition, and a prototype physical demonstration.

translated by 谷歌翻译

A Targeted Sampling Strategy for Compressive Cryo Focused Ion Beam Scanning Electron Microscopy

Daniel Nicholls , Jack Wells , Alex W. Robinson , Amirafshar Moshtaghpour , Maryna Kobylynska , Roland A. Fleck , Angus I. Kirkland , Nigel D. Browning

分类：机器学习

2022-11-07

Cryo Focused Ion-Beam Scanning Electron Microscopy (cryo FIB-SEM) enables three-dimensional and nanoscale imaging of biological specimens via a slice and view mechanism. The FIB-SEM experiments are, however, limited by a slow (typically, several hours) acquisition process and the high electron doses imposed on the beam sensitive specimen can cause damage. In this work, we present a compressive sensing variant of cryo FIB-SEM capable of reducing the operational electron dose and increasing speed. We propose two Targeted Sampling (TS) strategies that leverage the reconstructed image of the previous sample layer as a prior for designing the next subsampling mask. Our image recovery is based on a blind Bayesian dictionary learning approach, i.e., Beta Process Factor Analysis (BPFA). This method is experimentally viable due to our ultra-fast GPU-based implementation of BPFA. Simulations on artificial compressive FIB-SEM measurements validate the success of proposed methods: the operational electron dose can be reduced by up to 20 times. These methods have large implications for the cryo FIB-SEM community, in which the imaging of beam sensitive biological materials without beam damage is crucial.

translated by 谷歌翻译

Correlated Feature Aggregation by Region Helps Distinguish Aggressive from Indolent Clear Cell Renal Cell Carcinoma Subtypes on CT

Karin Stacke , Indrani Bhattacharya , Justin R. Tse , James D. Brooks , Geoffrey A. Sonn , Mirabela Rusu

分类：计算机视觉

2022-09-29

肾细胞癌（RCC）是一种常见的癌症，随着临床行为的变化。懒惰的RCC通常是低级的，没有坏死，可以在没有治疗的情况下监测。激进的RCC通常是高级的，如果未及时检测和治疗，可能会导致转移和死亡。虽然大多数肾脏癌在CT扫描中都检测到，但分级是基于侵入性活检或手术的组织学。确定对CT图像的侵略性在临床上很重要，因为它促进了风险分层和治疗计划。这项研究旨在使用机器学习方法来识别与病理学特征相关的放射学特征，以促进评估CT图像而不是组织学上的癌症侵略性。本文提出了一种新型的自动化方法，即按区域（Corrfabr）相关的特征聚集，用于通过利用放射学和相应的不对齐病理学图像之间的相关性来对透明细胞RCC进行分类。 CORRFABR由三个主要步骤组成：（1）特征聚集，其中从放射学和病理图像中提取区域级特征，（2）融合，放射学特征与病理特征相关的放射学特征在区域级别上学习，并且（3）在其中预测的地方学到的相关特征用于仅使用CT作为输入来区分侵略性和顽固的透明细胞RCC。因此，在训练过程中，Corrfabr从放射学和病理学图像中学习，但是在没有病理图像的情况下，Corrfabr将使用CORFABR将侵略性与顽固的透明细胞RCC区分开。 Corrfabr仅比放射学特征改善了分类性能，二进制分类F1分数从0.68（0.04）增加到0.73（0.03）。这证明了将病理疾病特征纳入CT图像上透明细胞RCC侵袭性的分类的潜力。

translated by 谷歌翻译

Calibrating Ensembles for Scalable Uncertainty Quantification in Deep Learning-based Medical Segmentation

Thomas Buddenkotte , Lorena Escudero Sanchez , Mireia Crispin-Ortuzar , Ramona Woitek , Cathal McCague , James D. Brenton , Ozan Öktem , Evis Sala , Leonardo Rundo

分类：机器学习 | 计算机视觉

2022-09-20

自动图像分析中的不确定性定量在许多应用中高度满足。通常，分类或细分中的机器学习模型仅用于提供二进制答案。但是，量化模型的不确定性可能在主动学习或机器人类互动中起关键作用。当使用基于深度学习的模型时，不确定性量化尤其困难，这是许多成像应用中最新的。当前的不确定性量化方法在高维实际问题中不能很好地扩展。可扩展的解决方案通常依赖于具有不同随机种子的相同模型的推理或训练集合过程中的经典技术，以获得后验分布。在本文中，我们表明这些方法无法近似分类概率。相反，我们提出了一个可扩展和直观的框架来校准深度学习模型的合奏，以产生近似分类概率的不确定性定量测量。在看不见的测试数据上，我们证明了与标准方法进行比较时的校准，灵敏度（三种情况中的两种）以及精度。我们进一步激发了我们在积极学习中的方法的用法，创建了伪标签，以从未标记的图像和人机合作中学习。

translated by 谷歌翻译

Ontologizing Health Systems Data at Scale: Making Translational Discovery a Reality

Tiffany J. Callahan , Adrianne L. Stefanski , Jordan M. Wyrwa , Chenjie Zeng , Anna Ostropolets , Juan M. Banda , William A. Baumgartner Jr. , Richard D. Boyce , Elena Casiraghi , Ben D. Coleman

分类：人工智能

2022-09-10

通用数据模型解决了标准化电子健康记录（EHR）数据的许多挑战，但无法将其集成深度表型所需的资源。开放的生物学和生物医学本体论（OBO）铸造本体论提供了可用于生物学知识的语义计算表示，并能够整合多种生物医学数据。但是，将EHR数据映射到OBO Foundry本体论需要大量的手动策展和域专业知识。我们介绍了一个框架，用于将观察性医学成果合作伙伴关系（OMOP）标准词汇介绍给OBO铸造本体。使用此框架，我们制作了92,367条条件，8,615种药物成分和10,673个测量结果的映射。域专家验证了映射准确性，并且在24家医院进行检查时，映射覆盖了99％的条件和药物成分和68％的测量结果。最后，我们证明OMOP2OBO映射可以帮助系统地识别可能受益于基因检测的未诊断罕见病患者。

translated by 谷歌翻译